Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[orc] Optimize configuration creating in orc file format #4716

Merged
merged 3 commits into from
Dec 16, 2024

Conversation

zhangyazhe
Copy link
Contributor

Purpose

  1. This PR optimizes the creation performance of OrcFileFormat, with an overall performance improvement of at least 500 times. Performance tests can be found in the test files.
  2. In actual tests, before optimization, creating an OrcFileFormat took about 6ms, with approximately 7 calls per query, and the scan time was about 40ms+. Under high QPS conditions (50+), the scan time could surge to several hundred milliseconds. After optimization, creating an OrcFileFormat is controlled within about 0.2ms, with approximately 7 calls per query, and the scan time is about 2ms or less.
  3. Optimization principle: Each time Hadoop config is created and set, it scans system files.

Tests

paimon-format/src/test/java/org/apache/paimon/format/orc/OrcFileFormatTest.java

Note: This PR is split from #4497

public OrcFileFormat(FormatContext formatContext) {
super(IDENTIFIER);
this.orcProperties = getOrcProperties(formatContext.options(), formatContext);
this.readerConf = new org.apache.hadoop.conf.Configuration();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just use new Configuration(false), and tests passed. Do you think it is ok to just use this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@JingsongLi JingsongLi changed the title [orc][format] add cache for OrcFileFormat [orc] Optimize configuration creating in orc file format Dec 16, 2024
@JingsongLi JingsongLi merged commit 683fa19 into apache:master Dec 16, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants